Nonlinear MMSE Multiuser Detection Based on 
Multivariate Gaussian Approximation 



Peng Hui Tan, Student Member, IEEE, Lars K. Rasmussen, Senior Member, IEEE * 

February 15, 2005 

Abstract 

In this paper, a class of nonlinear MMSE multiuser detectors are derived based on a mul- 
tivariate Gaussian approximation of the multiple access interference. This approach leads to 
expressions identical to those describing the probabilistic data association (PDA) detector, 
thus providing an alternative analytical justification for this structure. A simplification to the 
PDA detector based on approximating the covariance matrix of the multivariate Gaussian 
distribution is suggested, resulting in a soft interference cancellation scheme. Correspond- 
ing multiuser soft-input, soft-output detectors delivering extrinsic log-likelihood ratios are 
derived for application in iterative multiuser decoders. Finally, a large system performance 
analysis is conducted for the simplified PDA, showing that the bit error rate performance of 
this detector can be accurately predicted and related to the replica method analysis for the 
optimal detector. Methods from statistical neuro-dynamics are shown to provide a closely 
related alternative large system prediction. Numerical results demonstrate that for large sys- 
tems, the bit error rate is accurately predicted by the analysis and found to be close to optimal 
performance. 
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1 Introduction 



It is well-known that the computational complexity of individually optimal detection for direct- 
sequence code-division multiple-access (CDMA) grows exponentially with the number of users 
[1], as the computation of the marginal posterior-mode (MPM) distribution is required. Max- 
imum a posteriori probability (MAP) detection for each user is therefore far too complex for 
practical CDMA systems with even a moderate number of users. The exponentially growing 
complexity has inspired a considerable effort in finding low complexity suboptimal alternatives 
capable of resolving the detrimental effects of multiple-access interference (MAI). 

Interference cancellation (IC) strategies have been subject to particular attention due to low 
complexity, a simple modular structure and competitive performance [2]. Early work was fo- 
cused on linear cancellation and hard decision cancellation [3,4]. More recently, soft deci- 
sion cancellation have been shown to provide performance improvements. In [5] it was shown 
that soft decision cancellation based on convex projections provides an iterative solution to the 
convex-constrained multiuser maximum-likelihood problem. The well-known result that the op- 
timal nonlinear minimum mean squared error (MMSE) estimate is the conditional posterior- 
mode mean was used in [6] for a decision-feedback receiver. Similar arguments were used in [7] 
to arrive at a soft decision IC structure, and the same structure was derived in [8] based on neu- 
ral networks arguments. Even though this cancellation structure has a low complexity of order 
0(K 2 ), numerical examples show that near single-user performance can be achieved for large 
systems [8]. 

In [9], the probabilistic data association (PDA) method was introduced for multiuser detec- 
tion as a low complexity nonlinear alternative. The decision statistics of the users are modelled 
as binary random variables where the MAI is approximated as multivariate Gaussian noise. The 
a posteriori probability (APP) for the data symbols of each user is updated sequentially given the 
associated APPs of all other users. Although this scheme has a low computational complexity of 
order 0(K 3 ), it can achieve near single-user performance for systems with a moderate number 
of users [9]. 

The most celebrated multiuser detectors applied for iterative multiuser decoding of coded 
CDMA are based on linear filtering, e.g., [10-17]. Parallel IC (PIC) and linear MMSE filtered 
PIC were investigated in [10-12] and [13, 14], respectively. In [15], it was observed that for 
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low-complexity detectors, information combining over iterations can be rewarding, providing 
performance and system load gains. The partial cancellation structure in [15] was justified in 
[16] as recursive maximal ratio combining over all previous iterations, while a more complicated 
vector Kalman filter applied across iterations was presented in [17]. Nonlinear multiuser de- 
tectors based on list detection have been developed for iterative multiuser decoding and shown 
to provide equally impressive performance gains at low complexity [18]. As the PDA detector 
generates APPs directly, it has been applied for iterative multiuser decoding with only minor 
modifications, also demonstrating competitive gains [19]. 

Large system performance analysis techniques from statistical mechanics and statistical neuro- 
dynamics have been applied successfully for performance analysis of some multiuser detectors. 
In [20], the performance of the optimal multiuser detector was analyzed based on the replica 
method. This approach has further been developed in [21], and in [22] for coded CDMA. A 
different approach inspired by statistical neuro-dynamics was used in [23] to arrive at a large 
system analysis for a belief propagation (BP) multiuser detector. Methods from statistical neuro- 
dynamics [24, 25] have also been applied in [26] for large system analysis of PIC. 

In this paper, a class of nonlinear MMSE (NMMSE) multiuser detectors are derived based on 
a multivariate Gaussian approximation of the MAI. The computation of the NMMSE estimate 
requires a sum of terms, which grows exponentially in numbers with the number of users. Using 
the multivariate Gaussian approximation, this summation is replaced by integration, reducing the 
complexity significantly. The expressions describing this approach is shown to be identical to 
the description of the PDA detector in [9], thus providing an alternative analytical justification. 

A simplification to the NMMSE/PDA detector 1 , based on approximating the covariance ma- 
trix of the multivariate Gaussian distribution with a diagonal, is suggested. The corresponding 
soft interference cancellation scheme is similar to the IC structure of the detectors in [7, 8]and 
can be implemented in parallel or serially. The corresponding complexity is of the order of IC, 
namely 0(K 2 ) as compared to the PDA with an order of complexity of 0(K 3 ). 

Multiuser soft-input, soft-output (SISO) detectors delivering extrinsic log-likelihood ratios 
(LLRs) at the output are derived from the class of NMMSE-based detectors. The multiuser SISO 
detectors are applied for iterative multiuser decoding of coded CDMA and found to converge to 

'in the remaining of the paper, this detector is referred to as the simplified PDA detector. 
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single-user performance at loads larger than linear multiuser SISO alternatives. 

Finally, a large system performance analysis is conducted for the simplified PDA. In the large 
system limit, the bit error rate performance of this detector can be accurately predicted and related 
to the replica method analysis for the optimal detector [20]. Methods from statistical neuro- 
dynamics can also be used for a closely related alternative large system prediction [23, 24]. It 
follows that the simplified PDA has the same predicted large system performance as the optimal 
detector. Numerical results show that for large systems, the bit error rate (BER) is accurately 
predicted by the analysis and found to be close to optimal performance. 

The paper is organized as follows. In Section 2, the uncoded and coded CDMA discrete- 
time models are presented together with the standard iterative multiuser decoding structure. In 
Section 3 nonlinear minimum mean squared error estimation, leading to the marginal posterior- 
mode (MPM) decision, is briefly reviewed providing the setting for the multivariate Gaussian 
approximation considered in Section 4. The simplified PDA is derived in Section 5, while the 
corresponding NMMSE-based multiuser SISO detectors are detailed in Section 6. The large 
system analysis of the simplified PDA is derived in Section 7, numerical results are presented in 
Section 8 and concluding remarks are summarized in Section 9. 



2 System Model 

An elaborate discrete-time system model for CDMA is developed from first principles in [27]. 
The discrete-time model described below is a simplified, special case of this general model. For 
simplicity, assume a symbol-synchronous CDMA system with K users, binary data symbols 
and binary spreading with processing gain N. Random spreading is assumed where each binary 
chip is modulated onto a common chip waveform for transmission. The output of a bank of K 
chip-matched filters is given by 

r = [si, s fc , s K ] d + n = Sd + n, (1) 

where S G {±l/\^N} NxK is the spreading matrix, d e {±1} X is the data symbol vector, n 
is a zero-mean additive white Gaussian noise (AWGN) vector with covariance matrix cx 2 I, and 
N = 2a 2 is the one-sided spectral density of the white Gaussian noise. The model is illustrated 
in Figure 1 within the error control coded model. 
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Figure 1: Discrete-time model for coded CDMA. 



Some notation that will prove useful later on. At chip interval /i, the received signal is de- 
scribed by r M = J2k=i s^dk+n^ where r M , s^ k and n M are corresponding elements of the vectors 
r, Sfc and n, respectively. In addition, let S k = [si, s^-i, s fc+1 , s K ] be the spreading matrix 
with column k removed. The model in (1) can be further developed to include bit- level matched 
filtering as y = S T r = Rd + z, where E{zz T } = a 2 K. It follows that y k = J2f=i Rkjdj + z k , 
where y k and z k are respective elements of vectors y and z, while R kj is the corresponding 
element of the matrix R. 

When error control coding is introduced, the model is extended as shown in Figure 1. Now 
the binary data symbols are encoded, interleaved and mapped onto a binary phase- shift keying 
constellation in order to arrive at the code symbol vector d, which corresponds to the data symbol 
vector in the model for the uncoded case. In this paper, we consider iterative multiuser decoding 
for the coded case with the corresponding decoding structure shown in Figure 2. A multiuser 
SISO detector computes extrinsic LLRs of the code bits for all the users based on the received 
signal and a priori LLRs of the code bits. The extrinsic LLRs of user k are deinterleaved and 
input to an APP decoder for the error control code applied by user k. This single-user decoder 
outputs extrinsic LLRs, which are interleaved and, together with extrinsic LLRs of all the other 
users, forwarded to the multiuser SISO as a priori LLRs for the next iteration. This type of 
iterative multiuser decoder is a direct application of the turbo decoding principle and commonly 
used for iterative multiuser decoding [13, 17, 19, 22]. 
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Figure 2: General structure for iterative multiuser decoding. 

3 Nonlinear MMSE Estimation 

Let the nonlinear MMSE data estimate for user k be denoted as m k = g*(4, r), where g*(4, r) 
is the nonlinear function that minimizes the mean squared error E{(4 — g(4, r)) 2 }. In order to 
find the optimal nonlinear function, the mean squared error is expressed as an expectation of a 
conditional expected value E{E{(4 — g(d k , r)) 2 |r}} [28]. Since the inner expectation is always 
positive, the minimum is achieved by: 



min E{[4 -g(4,r)] 2 |r} = min V [d k - g(4, r)] 2 Pr(4|r), 

g(<4,r)eG g(d k ,r)eG /— ' 

Ofc=±l 



(2) 



where G is the relevant set of nonlinear functions. The solution is the conditional mean E{ Pr(4 1 r) } 
[28], leading to 



m k = g*(d k ,r)= ^ 4Pr(4|r) = ^ 4Pr(d|r) 

de{-i,+i} K 



(3) 



d k =±i 



Note that the polarity of m k in eqn. (3) is in fact the marginal posterior-mode decision, i.e. 

d* k = arg max Pr(4|r) = sign < 4Pr(d|r) 

[de{-i,+i} K 

Based on eqns. (2) and (3), the NMMSE data estimates for all the users can be described by 
a set of K optimization problems: 



m k 



arg min V"(4 - m fc ) 2 Pr(4|r), for k = 1, 2, K, 



where m k is the NMMSE data estimate for user k. The K problems can be solved independently 
since Pr(4|r) can be computed independently for each user. 

Following Bayes' rule, the marginal posterior-mode distribution can be found as 

PrWr) = /^Hi/V (4) 
22 dk Pr(4)p(r|4) 

Here, the probability density function (pdf) p(r |4) is found as a sum over 2 K ' 1 terms as follows: 
p(r|4)= p(r|d)Pr(d\4), (5) 

where d\dk denotes a vector containing all the elements in d except 4- This approach is however 
impractical for large system loads, as the computational complexity grows exponentially with the 
number of users. As an alternative, a multivariate Gaussian approximation is introduced below. 



4 Multivariate Gaussian Approximation 

Consider the received signal at chip level. The conditional pdf at chip interval /i is 

, I iv _ ex P [~2^ ( r M ~ s nkdk - A Mfc ) 2 ] 

where A^ k = Yli^k -V^ * s me corresponding MAI. The conditional symbol-level pdf in (5) can 
then be expressed as 

N 

p(r|4) = nP^l d ) Pr ( d H) 



E pr(d\4) PL p " — ^, (6) 

d\<i fc e{-i,+i}*-i v ; 

where A k = [A lk , A Nk ] T is a vector for user k, containing the MAI contributions for each 
chip interval. 

To reduce complexity, the probability distribution function of the random variable vector A k 
is approximated by a multivariate Gaussian pdf. The summation in (6) can thus be replaced by 
an iV-fold integration over the support of A k 

/oo poo poo poo ^ 

... p(r,A k \d k )dA k = ... p(r M | A^, 4)p(A fe ) dA k , (7) 

OO J — oo J —OO J —OO -i 

fj, — J. 



where dA k = Yla=i d^^k denotes differentials for integration. The multivariate Gaussian pdf is 
described as follows. Since A Mfc = Ylii^k s td^i, it is reasonable to assume that the corresponding 
mean and covariance are 

u^ k = E {A M .} = ^ 

i^k 

and 

Cov{A Mfc A, fc } = E{A Mfe A^,}- E{A Mfc }E{A, fe } 

= Yl s M s "iO- ~ m ?) +E E s m s » 1 ( E - m i m i) ■ ( g ) 

j+k j^k i^j,k 

In the second term in (8), the expectation E {djdi} must be computed. This computation has 

a complexity of the order of 0(K 2 ). To reduce complexity, the second term is omitted in the 

following. As K grows large, it is expected that E {djdi} — > mjirii and thus, the second term 

becomes negligible. The effect of removing this term is considered in Section 8 using numerical 

examples. With this simplification, the covariance matrix of A k is reduced to 

n k = Cov{A fe A£} = ^(1 - mf^s] = S fe Diag[l - m,, o m k }S T k , 

i^k 

where Cov{A Mit A !/fe } = J2i^k s ^ s ^( 1 ~ m l)' m k = m 2, -, m k -i, m k+1 , m N } T and aob 
denotes the Hadamard-product [29] of vectors a and b, respectively. The multivariate Gaussian 
pdf of A fc is then 

fA v = exp [-1(4-^)^(4-^)] 

where u k = [u lk ,u 2k , •••,M A r fc ] T = S k m k . 

Substituting this into (7) and performing the iV-fold integration yields 

p(r|4) oc exp|-i(r-s fe 4-u fe ) T (O fc + o- 2 I) _1 (r-s fe 4-u fe ) 

oc exp{4(r-u fe ) T C^ 1 s fe } = exp{4s^C fe 1 (r-S fe m fe )}, (9) 

where C k = fl k + a 2 l. It follows that the NMMSE estimate is given by 

a o.(a n A Pr(4 )p(r|4) 

m k = 4Pr(4|r) = ^ »fcyi — 

d k =±i d k =±i r 



'(4)p(r|4) 



dk- 

= tanh[A^/2 + s T fe C^(r-S fe m fe )], (10) 
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where = log[Pr(d fe = 1)/Pr(d k = — 1)] is the a priori log-likelihood ratio (LLR). 

The above detector is know as the PDA detector, first suggested in [9]. Our contribution is to 
relate the PDA detector to the NMMSE estimation problem, which shows that the corresponding 
output is an approximation to the conditional a posteriori mean. Also, it is clear from (10) that 
the PDA detector corresponds to a nonlinear, filtered IC structure. 

Solving the nonlinear system of equations in (10) requires a computational complexity of the 
order of 0(K 3 ) [9], where the complexity is dominated by the inversion of C k . A simplified 
approach is suggested below, approximating C k with a diagonal matrix. 



5 Simplified Probabilistic Data Association Detection 

For large systems, the diagonal elements of C k are dominant, encouraging the following approx- 
imation C k = (fl k + (x 2 I) ^ (al + <7 2 )I, where a\ = a(l — Q) with a = K/N being the system 
load and Q — (1/K) J2 k m l- The conditional pdf (9) is then simplified to 



N exp 

p( r K) = n — 



2(^+^) 



exp 



2K+<x 2 ) 



oc exp 



4s£ (r - S fc m fc ) 
^1 + ^ 2 



[(27r(a fc 2 + <r»)] 



N/2 



which leads to 
m k = tanh 



XI s T fc (r-S fc m fc ; 



err 



tanh 



+ 



at + a' 



(11) 



(12) 



Note that (12) is similar to the iterative soft-decision multi-stage interference cancellation (MIC) 
scheme suggested independently in [6-8]. The MIC is described by 



m k = tanh 



A| _ 

2 ^ 2 + E 7 ^l(i-™ 2 ) 



(13) 



■Jj^k kj [ 

For large K and N, the term Ylj^k -^|?(1 ~ m f) ^ s we U approximated by a(l — Q), using the 
fact that E{Rlj} = 1/N. 

A simple way to solve (12) is by iteration over all users from an initial solution m°. This can 
be done in parallel as 

Vk ~ Ejyfe R kjm) 



m k +1 = ujm\ + (1 - cj)tanh 



2 a 2 + a(l 



(14) 
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where superscript t denotes the corresponding variable at iteration t. Also, < u < 1 is a 
weighting factor which improves the convergence properties of the parallel iteration in (14). Sim- 
ilar weighting factor approaches have been applied to linear cancellation and convex-constrained 
cancellation in [5, 30]. 

The fixed-point problem in (12) can also be solved with a serial iteration as 



m 



t+i 



= ujrvb k + (1 — c<j)tanh 



+ 



a 2 + a <1 



K 



jj=l I 



t+l\2 
3 I 



£f=*K) 2 ]}_ 



.(15) 



It should be noted that convergence is not assured in general. However, for a series of numerical 
experiments, it has been observed that the serial implementation with u = always converged 
while a nonzero weighting factor is required for the parallel case to ensure convergence. 

In the following, the parallel implementation in (14) is denoted as the parallel simplified PDA 
(PS PDA) and the serial implementation in (15) is denoted as the serial simplified PDA (SSPDA). 



6 Multiuser Decoding 

The multiuser detectors considered in this paper are based directly on estimating the marginal- 
mode probability distribution function. This feature makes these detectors well suited for low- 
complexity iterative multiuser decoding, requiring only minor modifications. Based on the gen- 
eral iterative multiuser decoding approach in [13, 19,22], the extrinsic LLRs of the detectors 
developed above are derived. 

^From (4), the LLR for user k based on the marginal mode probability distribution is 

APP Pr(4 = l|r) Pr(4 = l)p(r|4 = 1) , p , e 

Afe = '° g Pr(4 = -1|p) = '° g Pr(4 = -l)p(r|4 = -1) = A * + Afc ' 

where \ p k is the a priori LLR and \ e k = log ^^~i) is the extrinsic LLR for user k. A mul- 
tiuser SISO based on the PDA detector is determined from (9). The corresponding LLR is 
= 2s T k C k 1 (r — Sfcin fc ), following a sufficient number of iterations of the PDA detector, 
according to (10) either in parallel or serially. This is to arrive at as good an approximation 
as possible to the conditional a posteriori mean. Considering the approximate conditional pdf 
in (11), the corresponding LLR for a multiuser SISO based on the simplified PDA detector is 
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K = ^^fr^ again assuming sufficient iterations of (14) or (15) to get a good approxima- 

k 

tion to m k for all k. 

Note that we now have two separate iterations, namely the overall multiuser decoding itera- 
tion, exchanging LLRs between the multiuser SISO and the bank of single-user APP decoders, 
and the internal NMMSE-detector iteration, improving the NMMSE estimate. A further design 
parameter is the choice of the initial solution m°. Typical choices are m° = 0, m° = S T r or 
m° = tanh [A^/2], k — 1, 2, K, using the most recent prior LLR for user k. 

The performance of the proposed multiuser SISO detectors within an iterative multiuser de- 
coder is evaluated based on numerical examples in Section 8. 



7 Large System Performance Analysis 



In this section, large system analysis is considered for the uncoded case. The BER performance 
of the PSPDA detector in (14) with uniform binary priors (i.e., A^/2 = 0) and m° = is 
investigated using an approach similar to [23, 26]. 

Let h\ = A 1 (y k — J2j^k -^fcj m j) > where A 1 = [a 2 + a (1 — Q 1 )}^ 1 . We can then express 
(14) as 



m* +1 = ujm\ + (1 - ^)tanh [h\] = ^p*- K tanh [h K k ] , 

where the recursion in (14) has been repeatedly applied such that, 

a;*- 1 if/t = 
(1-cjV*- k if/t^O 

The corresponding decision at iteration t + 1 is given as 



„t — K 



di +1 = sign i 



rn 



t+v 



sign 



^Tp^tanh^; 



K=0 



and the BER at iteration t + 1 can subsequently be determined as 



H +1 = ^E{l-44 +1 } = ^E jl-4sign 



^Tp^tanh^) 



K=0 



= -E <( 1 — sign 



]Tp^tanh(4^) 



K = 



(16) 



(17) 



(18) 
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Assuming that d k h\ is a random variable, independently sampled from a Gaussian distri- 
bution 2 with mean value E 1 and variance F t,t , respectively, and corresponding pdf p dh (f3 l ), it 
follows that the BER in (18) can be determined through a t-fold integration as 



pt+i 

n 2 



/»00 /*C 

2/ -oo "7 -c 



1 - sign 



£y-"tanh(d fc /$ 



K=0 



i=0 



When u; = 0, (18) simplifies to 
1 



2 E {1-sign [tanh(d fc /ifc)] } = -E {l - sign (4/4)} 



?dh(FW= I Dz, 

> •/ — oo 

where the third equality in (19) follows from 



(19) 



1 — sign (re) 



x > 
2 x < 



and = obexp(— z 2 /2)/y/2n. Under the assumption that the tentative decision statistics 
{ml} in (14) converges to a fixed-point as t — > oo, mj^ 1 = m^. = m fc , and thus, m fc = tanh [h k ]. 



Consequently, the BER in steady- state can be determined by (19) for any weighting factor < 
uj < 1 using the steady-state distribution Pdh(P) with mean value E and variance F. 

The task is therefore to derive useful recursive expressions for E l and F*'*. For this purpose, 
we define the following parameters, M* and Q l . These parameters turn out to be closely related 
to £* and F*>*. 

M t+1 = E{d k m k +1 } = coE{d k m{} + (l-Lj)E{tanh(d k h{)} =uM t + (1-uj)I\(20) 

and 







t+1 



E{« +1 ) 2 } 

cu 2 E{(m' fc ) 2 } + (1 - cj) 2 E{tanh 2 (4/4)} + 2w(l - cj)E{m*tanh(/i*)} 

-cu 2 Q* + 2^Q' +1 >* + (1 - cu) 2 j', 



(21) 



2 This assumption becomes increasingly valid as K , iV — > oo with if/TV = a. 
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where 



/OO pOO 
tanh(/3*)p dft (/3*)d/3* = / tanh^v^ + E?) Dz, 
-oo J —oo 

/oo poo 
tanh 2 (/3*)p dh (/3')rf/3* = / tanh^v 7 ^ + E l ) Dz. 
oo J —oo 

The correlation Q t+1 ' T is given by 

Q t+1 >r = cE{m*ma + (l-c)E{m^tanh(^)} 

T-l 

= ^^(l-^^p^^Ejtanh^tanh^)}. (22) 



K=0 



In order to get an expression for Q t+1 ' T , we need to derive an expression for E{tanh(/4)tanh(/i£)}. 
We first note that (d k h\, d k h k ) has a joint Gaussian probability distribution function with 



E {d k hl d k ht) = (E\ E K ), Cov (d k hl d k hl) 



pt,t pt,K 

pt,K pr,T 



Rewriting d k h\ and d k h k in terms of three independent, zero-mean, unit-variance Gaussian ran- 
dom variables {a, b, c}, and the statistics above, we get 

d k h\ = \ f F J t {aV\ K + cTf) + E l and d k h K k = \ f F^ {bV\ K + cTf) + E\ 

where 



F 



7t,K 



^/pt,tp K ,K 



and T 



F f ' h 



^Jpt,tpK,K 



It follows that 

E{tanh(/4)tanh(/^)} = 



oo poo poo 

/ / tanh 

oo J oo J oo 

xtanh 



(aT l { K + cr*' K ) + E 

\[f^ {w\ r + c r* A ) + e k 



DaDbDc. 



Thus, in order to determine we need to determine the covariance between d k h\ and d k h T k 
denoted by F l ' T . 

In the large-system limit, the sample mean converges to the ensemble expectation. Exploring 
that at stage t, d k h\ is independently sampled, we can then determine the mean, variance and 
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covanance as 



1 K 

E* = E {4/4} = ^ E ^ for K ^ oo 



fc=i 

A" 



= Var {4/4} = ^ E (^) 2 " (^) 2 , for ^ - oo 



fc=i 



1 k i K K 

F t,T = Cov (4/4, d k hl) = -J2 h lK - E E W» for ^ - oo 



fc=i j=i i=i 



Considering the correlation between dj, R k j and m k , we can use methods from statistical 
neuro-dynamics [24,25] to determine F*, F l,t and F t,T . Recently, this method has been applied 
to analyze the performance of the parallel cancellation detector in [26] . The output h k can be 
expressed as 



A'sJ (r - S fc m* ) = A* E V K - E 8 n m i ) = E 4*' < 23 > 



where 



NdkS^h 



Ndk-s^r^ - ^fNdkS^ E s w m 



= y/NdkS^r^ - u\/Nd k s^ k E s ^' m i 1 _ I 1 _ ^)v / iV4s M fc E s fty tann (Aj 



(24) 



f Nd k s ilk r lx - VNdkS^ E s w -tanh (/i* x ) 

As we aim for using (23) in determining F* and F*'*, the derivations are complicated by s w - and 
tanh (/i* _1 ) being statistically dependent. To obtain a recursive relation, the terms tanh (h^ 1 ) 
are therefore expanded to separate the dependence of tanh (/i* -1 ) and s w . This can be achieved 
via a Taylor expansion, f(x) « f(x ) + f(x )(x — x ), as follows 



tannin « tanh (/#) + sech 2 (htf) - Kt) 



MJ ) \ 3 
2 (ut-\\ \t-\ 



= tanh (ft;- 1 ) + sech 2 (ft^ 1 ) A 



V s w [di-m\ x ) + 



where /i* / is chosen such that it contains no terms with s uj , 



z,*-i _ /tt-i 



4 + E E s ^ s ^' ( d j ~ m ) l ) + E s ^ niy 
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and sech(x) = cosh 1 (x). The term Y^j^k s^tanh {h 1 - x ) in (24) can now be expressed as 
J^-tanh^- 1 ) « ^s^tanh^ 1 ) 



+ $> w sech 2 (Z^ 1 ) A* 



^^■tanh^J+a^A*- 1 



-E 



t-i 



(25) 
(26) 



where 



(27) 



In the second term in the step from (25) to (26), the two summations have been extended over all 
j and i, respectively, simplifying the derivations below. In the large system limit, these few extra 
terms included in the summations do not affect the final results. 
Substitute (26) into (24), we have 



"fa 1 + (1 " 



-a^A 1 1 VNd k s^ k r^ - VNd k s^ k ^ s 



t-1 



i^k 



LUZ 



t-1 



+ (l-u) [fa - atfA^fa 1 ] , 



(28) 



where 



z 



; „ = y/NdkS^rp - \fNd k s^ k ^ s w -tanh (h^ 1 ) . 

3+k 



With m° = 0, we can find z® k = \J~N d k s ^ k r ^ and then using (28) recursively, we can deter- 
mine z l ^ k . Letting B l = E jv^/V^ j, we can also use (28) to arrive at the following recursive 
relationship 

B l = uB 1 - 1 + (1 - u) [l - aU'A'^B'- 1 ] , (29) 
where E jv^/V^ j = 1, since and tanh (tfa 1 ) are approximately independent. 
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Finally, using (28), the covariance of z* fe and z T k is given as 

c*' T = cov{4^} 

= ^CT*- 1 --- 1 + W (l - u) [E - aCTA T - 1 C7*- 1 ''- 1 ] 

■Ml " ") [E l^ 1 ^.} - aUtA^C- 1 *- 1 ] 

-at^A'^E {z^z^} + a 2 C^A*- 1 CTA T - 1 C*- 1 > T - 1 ] , (30) 



where 

V l ' T = E{^} = a + (7 2 -ai*- 1 -ar- 1 + aE{f(/iS 1 )f(/i^ 1 )}. (31) 

The two remaining terms Ejz*^ 1 ^} and E-fz^" 1 ^.} can be determined recursively from 
E {z^z^j. These derivations are straightforward and have been omitted to save space. 

Now we have all the terms required to determine the mean E 1 and the covariance F l ' T . Since 
h\ = A t dk J2n zjik/^/N' ^ follows that E l and F i,T are given by 

& = E{d k h\} = E^A t ^zl k /^/N^=A t B\ (32) 

and 

F t,T = Cov{4^4^} = Cov{^} = ^Cov|^^4^ fc l/iV 

= A t A^Cov{zl k z; k }=A t A^, (33) 

respectively. Note that for uo = and as JJ t — > 0, (32) and (33) tend to 

1 

a 2 + ai i_ Q ty 



& = i , \ ™, (34) 



F t,t a(l-2M* + QQ + a 2 

[a 2 + a(l-Q')] 2 ' V ; 

respectively. It has been observed that JJ t — > when E 1 * and F*'* increase. More importantly, 
equations (20), (21), (34) and (35) are identical to the fixed point iterations of the saddle point 
equations found by the replica method analysis for optimal detection [20]. Hence, the expres- 
sions obtained above link the simplified PDA detector to the replica analysis of the equilibrium 
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state presented in [20] for uniform binary priors. Based on the large system analysis in this sec- 
tion, we conclude that the simplified PDA detector approaches the performance of the optimal 
detector as K and N grows large with a = K/N and transmission is conducted at a sufficiently 
large E b /N . 

Finally, under the assumption that the tentative decisions {m^} in (14) converge as t — > oo, 
we can regard all quantities as being independent of subscripts t and r. Following from (20), 
(21), (27), (29)-(33), the equilibrium conditions are then given by 

M = 1 = J tanh (zVF + E^j Dz (36) 

Q = J = y tanh 2 (zVF + Dz (37) 

U = J seen 2 (zy/F + Dz (38) 

A = 2 ■ n n\ (39) 
a 2 + a(l - Q) 

A 

E = (40) 

1 + aUA V 

A^ + a(l-2M + Q)] 
F = (TTaUAj (41) 

With initial values for M, Q and U, we can then recursively find the steady- state solution to the 
above equations, leading to a numerical approach determining the large-system E and F, and 
thus the corresponding large-system BER performance. 



8 Numerical Results 

In this section we illustrate the results above through numerical examples. First, the empirical 
pdfs of Cov{A Mfc Aj, fc } in (8) is investigated. Figure 3 shows the empirical pdf with and without 
the second term in (8). For a lightly loaded system (a = 0.25), omitting the second term has 
only a minor effect on the pdf as seen in Figure 3(a). The difference is more pronounced when 
the load increases to 1, as shown in Figure 3(b). Here, we can only simulate systems with a small 
number of users (K = 16) due to the computational complexity of determining the optimal 
marginal posterior-mode mean values m k . We expect the difference between the exact and the 
approximation to be reduced when K and iV increase. 
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(a) a = 0.25. (b) a = 1. 

Figure 3: Empirical pdf of Cc-vlA^A^} for different a. 

Now we consider the large system BER estimates derived for the PSPDA {uo = 0) through 
the replica analysis (RA) and statistical neurodynamics (SN) approach in Figure 4. The BER 
estimates for the SN approach are obtained from iterating (20), (21), (32) and (33), whilst the 
BER estimates for the RA approach are obtained from iterating (20), (21), (34) and (35). When 
the load is small (a = 0.1 in Figure 4(a)), the simulated BER performance coincide with those 
estimates from the SN and RA approach. As the load increases to 0.5 in Figure 4(b), the sim- 
ulated BER performance do not follow the SN and RA approach in the first few stages. But it 
does converge to the estimates given by the SN and RA approach. 

In Figure 5, the BER performance of BP [23], PSPDA (u = 0.4) and the SSPDA detectors is 
compared to the RA and SN predicted performance for an uncoded CDMA system with a — 1. 
Convergence is considered achieved when max \m\ — m^T 1 1 < 10~ 3 or the number of iterations 
has exceeded 100. Table 1 shows the average number of stages required for convergence. As 
Eb/N increases, the SSPDA detector converges faster and hence requires the least computa- 
tional complexity. As the load increases to 1, simple iterations of (20), (21), (32) and (33) do 
not yield the desired BER estimates for the SN approach as it get attracted to fixed points which 
yield poorer BER performance. The estimates from SN approach are obtained by searching fixed 
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(a) Comparison of replica analysis (RA), statisti- 
cal neurodynamics (SN) and simulation results for 

E b /N a = 6, 7, 8, 9 dB, K = 512, and a = 0.1. 



(b) Comparison of replica analysis (RA), statisti- 
cal neurodynamics (SN) and simulation results for 

E b /N = 6, 7, 8, 9 dB, K = 512, and a = 0.5. 



Figure 4: BER approximation. 



BER performance of BP, SSPDA, PSPDA fo = 0.4) and PDA 



BER performance of BP. SSPDA and PSPDA ft) = 0.4) 




(a) Small system. 



(b) Large system. 



Figure 5: Comparison of BER performance of the BP, PSPDA, SSPDA and PDA detectors for 
uncoded systems with uniform prior probabilities. 
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E b /N /dB 





1 


2 


3 


4 


5 


6 


7 


8 


9 


SSPDA 


12.5 


23.6 


77.9 


62.4 


48.9 


27.1 


14.4 


8.9 


6.8 


5.8 


PSPDA 


31.3 


58.0 


99.0 


99.0 


88.1 


62.4 


36.6 


24.1 


19.1 


17.1 


BP 


16.4 


20.3 


27.3 


39.1 


50.0 


39.6 


23.1 


14.4 


10.0 


7.7 



Table 1: Average number of stages required for convergence for K = 512 and a — 1. 

points for the nonlinear equilibrium ((36) - (41)) which minimize the BER for each E b /N . In 
Figure 5(a) it is observed, as expected, that for a small system (K = 32), the BP, the PSPDA and 
the SSPDA detectors do not attain the BER performance predicted by the RA. At large E b /N , 
these detectors fail to provide a useful level of performance. In contrast, when the number of 
users is large (K = 512), the BER performance of both the BP, PSPDA and SSPDA detectors 
coincide with the prediction of RA as in Figure 5(b). It is also noted that the serial SSPDA con- 
verges faster than the BP detector, which is implemented in parallel, while the PSPDA detector 
converges slower than the BP detector. 

In Figure 6, we compare the BER performance of the PDA [19], the parallel interference 
canceller (PIC) in [11,31], the serial SSPDA (15), the serial MIC (13) and the BP detector [23] 
in a coded CDMA system where each user applied a (5, 7) convolutional code, the processing 
gain is N = 16, the interleaver size is 1000 information bits per user and iterative multiuser 
detection is done as in [13, 19, 22]. The SSPDA, BP and MIC detectors are implemented with 3 
stages each. The BP detector converges faster than the SSPDA and MIC detectors. Since it is a 
small system, the MIC detector is expected to perform better than the SSPDA detector, which is 
confirmed in Figure 6, where the MIC detector approaches single-user performance faster than 
the SSPDA detector. It is noteworthy that the two additional stages of the detectors do improve 
the BER performance. For K = 28, both the MIC, BP and SSPDA detectors require 7 iterations 
of message passing, respectively, to approach single-user performance. The PDA detector also 
achieves single-user performance with 6 iterations, but is more computational intensive. How- 
ever, it converges slower than the BP detector when the number of users increases beyond 30. 



20 



BER for PDA and PIC with CC(5,7) and N - 16 




BER for SSPDA, MIC and BP with CC{5,7) and N = 16 





K = 28 (PDA) 




K = 30 (PDA) 




K = 32 (PDA) 




K = 34 (PDA) 




K = 28 (PIC) 


A. 


K = 30 (PIC) 




SU 






K 


28 (SSPDAi 




K 


30 (SSPDAi 




K 


32 (SSPDAi 


. . 


K 


28 (MIC) 


■ a ■ 


K = 


30 (MIC) 


e 


K = 


32 (MIC) 


■ A 


K = 


28 (BP) 


■ ■ 


K = 


30 (BP) 


■ • 


K = 


32 (BP) 


♦ 


K = 


34 (BP) 




SU 





2 3 4 5 

No. of iterations 



No. of iterations 



(a) PIC and PDA. 



(b) BP, MIC and SSPDA. 



Figure 6: Comparison of BER performance of the PDA, PIC, SSPDA, MIC and BP detectors for 
coded systems. 

9 Conclusions 



In this paper we have used a multivariate Gaussian approximation of the MAI to obtain a nonlin- 
ear MMSE estimate of the transmitted bits in a multiuser system. The assumption that the MAI 
is a multivariate Gaussian random variable leads to approximating expression of the marginal 
posterior-mode identical to those describing the probabilistic data association detector. Thus, the 
nonlinear MMSE framework provides an alternative justification for the PDA detector structure. 
A simplified PDA detector is found through diagonalization of a matrix inversion and recognized 
as having the same structure as previously suggested soft cancellation schemes. This simplified 
structure lends itself to large system analysis which is found to be closely related to the replica 
method analysis for the optimal detector, and it follows that the simplified PDA has the same 
predicted large system performance as the optimal detector. As the PDA-based detectors can 
output estimates of extrinsic probabilities directly, they are well suited for iterative multiuser 
decoding and found to provide single user performance at high loads. In a coded systems, it 
is noted that the additional stages of the simplified PDA do improve the BER performance, in 
contrast to traditional interference cancellation. 
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